Skip to main content

FNP - Data Structures - CRDT Conflict-Free Merge Semantics

Summary (Explain Like I’m 5)

Imagine you and your friend are writing a story at the same time on different sheets of paper: Problem: You both add sentences. Later you combine papers. Which sentence comes first? What if you both deleted something? Solution - CRDT:
  • Each sentence gets a unique invisible timestamp (position identifier)
  • When combining papers, the order is always the same, no matter who combines them
  • No conflicts because positions never overlap
  • Everyone ends up with the exact same story
In FNP: When 100 users edit simultaneously, LSEQ gives each edit a unique position. When merges happen, all 100 users see identical results - no voting, no conflicts.

Technical Deep Dive

CRDT (Conflict-free Replicated Data Type) is a data structure that guarantees:
  1. Eventual Consistency → All replicas reach same state
  2. Commutativity → Order of operations doesn’t matter
  3. Idempotence → Applying operation twice = applying once
  4. Deterministic Merge → No conflicts, no user intervention needed
FNP uses LSEQ - a CRDT for ordered sequences with three key properties:

1. Commutativity Proof (Why order doesn’t matter)

For any two operations op₁ and op₂:
Claim: apply(apply(doc, op₁), op₂) = apply(apply(doc, op₂), op₁)

Proof Sketch:
- LSEQ assigns unique position identifiers to each character
- Example: Alice inserts 'H' at position 1.0, Bob inserts 'i' at position 2.0
- Positions are generated from: (seq_num, site_id, clock)
- Since 1.0 ≠ 2.0 and order within same document is total order:
  - Alice's 'H' always comes before Bob's 'i' after merge
  - Sequence is deterministic regardless of merge order
  - If Alice merges first: ['H', 'i']
  - If Bob merges first: ['H', 'i'] ← SAME!
Why this matters: No two users’ edits can land on same position. Positions form total ordering. Therefore, merge order is irrelevant.

2. Idempotence Guarantee

For operation op applied to doc D:

First apply: D' = apply(D, op)
Second apply: D'' = apply(D', op)

CRDT Property: D' = D''

Why with LSEQ:
- Operation includes unique operation_id (site_id + clock)
- If same operation arrives twice, second application:
  1. Checks if position already exists in doc
  2. If exists (already applied), skips
  3. If not exists, inserts

This prevents "double-insert" issues in network retransmissions
Scenario: Network delay causes operation retransmit → arrives twice → LSEQ applies once (idempotent).

3. Deterministic Merge Algorithm

Algorithm: Merge LSEQ States (No conflicts)

Input: doc_alice, doc_bob [two replicas, both valid LSEQ states]
Output: doc_merged [merged state, identical for all viewers]

Steps:
1. Collect all positions from both docs: pos_all = pos_alice ∪ pos_bob
2. Sort positions by: (level, digit_sequence, site_id)
3. For each position in sorted order:
   - Take character from alice or bob (whichever has it)
   - If both have same position (concurrent insert):
     * Break tie by site_id (Alice_id vs Bob_id)
     * Deterministic tiebreaker ensures same order everywhere
4. Construct: doc_merged = positions_in_order

Guarantees:
- ✓ All concurrent inserts ranked by (position, site_id)
- ✓ No "conflicts" to resolve
- ✓ Same output regardless of merge order
- ✓ All replicas converge to identical state
Example Merge:
Alice's view:  ["H", "e", "l", "l", "o"]
               pos:   1.0  2.0  3.0  4.0  5.0

Bob's view:    ["H", "i", "e", "l", "l", "o"]
               pos:   1.0  1.5  2.0  3.0  4.0  5.0

Merge Process:
1. Collect positions: {1.0, 1.5, 2.0, 3.0, 4.0, 5.0}
2. Sort: [1.0, 1.5, 2.0, 3.0, 4.0, 5.0]
3. Map to chars:
   - 1.0 → 'H' (both)
   - 1.5 → 'i' (Bob only)
   - 2.0 → 'e' (Alice) vs 'e' (Bob) → same
   - 3.0 → 'l' (both)
   - 4.0 → 'l' (both)
   - 5.0 → 'o' (both)
4. Result: ["H", "i", "e", "l", "l", "o"] = "Hiello"

Both Alice and Bob see "Hiello" → converged! ✓

4. Tombstone Deletion (No erasure, just marking)

Problem: If Alice deletes position 2.0, Bob doesn't know position 2.0 existed
Solution: Don't erase, just mark as deleted (tombstone)

Insert: (pos: 2.0, char: 'i', visible: true)
Delete: (pos: 2.0, char: 'i', visible: false) ← same position, mark invisible

When merging:
- If Alice has: (2.0, 'i', false)
- If Bob has: (2.0, 'i', true)
- Reconcile: Apply delete_timestamp check
- Since delete happened later (higher causal clock), mark invisible
- Both replicas show invisible
→ No conflict! Causality resolves it.
Causal ordering ensures: Later delete timestamp wins over earlier insert timestamp.

Mermaid Diagrams

Key Terms

  • Commutativity → Operator order irrelevant; a + b = b + a
  • Idempotence → f(f(x)) = f(x); applying twice = applying once
  • Eventual Consistency → All replicas eventually reach same state
  • Deterministic Merge → Same operation order → same result always
  • Tombstone → Mark as deleted, don’t erase (LSEQ keeps history)
  • Total Ordering → Unique positions create strict ordering (no ties)
  • Causality → Happens-before relation defines operation order
  • Conflict-Free → No user resolution needed; math handles it

Q/A

Q: Why can’t CRDT operations conflict? A: CRDT position IDs are unique (site_id + clock). No two operations occupy same position. Total ordering of positions means merge order is irrelevant—results identical regardless. Q: What if two users insert at same position simultaneously? A: LSEQ breaks ties with site_id (Alice vs Bob). Deterministic tiebreaker ensures all replicas rank them same way. One comes first (by site_id), no ambiguity. Q: Does Commutativity mean I can apply operations in any order? A: Yes for CRDT operations. Apply op₁ then op₂ = Apply op₂ then op₁. This enables replicas with different operation order to converge identically. Q: What’s the difference between Commutativity and Idempotence? A: Commutativity: op₁ ∘ op₂ = op₂ ∘ op₁ (order). Idempotence: op ∘ op = op (repetition). CRDT ensures both—order doesn’t matter AND duplicate applications don’t break things. Q: Why use tombstones for deletion instead of actually erasing? A: If Alice deletes, Bob hasn’t received delete yet. If actually erased, Bob’s replica diverges. Tombstones mark invisible but keep history. When Bob’s delete arrives, causality ensures consistent deletion. Q: Can CRDT handle concurrent deletes of same char? A: Yes. Both delete operations target same position_id. Both increment delete_timestamp. Merge sees two “delete” markers—same position marked deleted. Result: converged invisibility. Q: What breaks Commutativity? A: Non-CRDT operations break it. Example: “insert at position 5” (absolute position). If Alice inserts first, Bob’s position 5 is now different. Merge order matters—not commutative. CRDT fixes this.

Example / Analogy

Collaborative Playlist Analogy: Without CRDT (conflicts):
  • Alice adds song “A” at position 1
  • Bob adds song “B” at position 1
  • When merging, which song is at position 1? Conflict!
  • Manager must choose: “Alice wins” or “Bob wins”
With CRDT (conflict-free):
  • Alice adds “A” at position (1.0, Alice_ID)
  • Bob adds “B” at position (1.0, Bob_ID)
  • Merge compares: (1.0, Alice_ID) < (1.0, Bob_ID)?
  • Yes! → Order is: [A, B]
  • Even if Bob merges first, positions are absolute
  • Result: All listeners see [A, B] ← no conflict!
FNP Real-world: 100 users editing 1 document simultaneously. LSEQ assigns each edit unique position. Final document is deterministically identical on all devices—no voting, no conflicts, no “save conflicts” dialog boxes.
Cross-References: LSEQ CRDT Identifiers, Protocol Flow, Distributed Systems Category: Data Structures | CRDT | Consensus | Distributed Algorithms Difficulty: Advanced ⭐⭐⭐⭐⭐ Updated: 2025-11-28